Building an electronic language database nowadays: The Prague Dependency Treebank

نویسندگان

  • Jarmila Panevová
  • Ferenc Papp
چکیده

1. I am sure that the domain of the creation of the large language corpora, where linguistic annotations are assigned to the input data, still belongs to the interests of the Festschrift's owner. I met Prof. Ferenc Papp for the first time in 1964 in Prague at the Colloquium on Mathematical Linguistics and we immediately have found a common basis of interest: how to store linguistic data about raw texts and how to deal with them using the contemporary technical equipment. At that time the most advanced technique for natural language processing available for linguists was represented by punch-card machines. In October, 1964, Ferenc Papp organized a small conference in Budapest, where the participants (I had the honour to be one of them) exchanged their opinions on what type of data can be stored on punch-cards, how they can be classified and evaluated from the point of view of their linguistic nature as well as from the point of view of the efficiency and role of the punch-card machine set. Reminding this in 2000, in the year of F. Papp's anniversary, the whole issue sounds as a kind of a crazy nostalgia. The punch-card machines disappeared very soon and the punch-cards, storing the rich inventory of linguistic data, become unreadable. However, the idea remains alive: In the 1980s corpus linguistics was born and in the 1990s (syntactically) annotated corpora started to develop.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank

The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank 2. Our aim is to revisit the present-...

متن کامل

Sherds from an Arabic Treebanking Mosaic

This paper would like to introduce the reader into those aspects of the Arabic language which require some special treatment compared to languages Europeans are more familiar with. In spite of having fresh experience in building the Prague Arabic Dependency Treebank, the authors try to take a broader view of the problems encountered under way. The topics discussed include linguistic data retrie...

متن کامل

Learning to Search in Prague Dependency Treebank

We present Netgraph – an easy to use tool for searching in linguistically annotated treebanks. On several examples from the Prague Dependency Treebank we introduce the features of the searching language and show how to search for some frequent linguistic phenomena.

متن کامل

Valency in the Prague Dependency Treebank: Building the Valency Lexicon

In this article we focus on valency, which belongs to the core phenomena being captured in the underlying level of the Prague Dependency Treebank (PDT). We present a summary of the basic principles of the applied theoretical framework including proposals for suitable refinement relevant to NLP. The current status of description of valency behavior of verbs, nouns and adjectives is outlined. We ...

متن کامل

Annotation Procedure in Building the Prague Czech-English Dependency Treebank

In this paper, we present some organizational aspects of building of a large corpus with rich linguistic annotation, while Prague Czech-English Dependency Treebank (PCEDT) serves as an example. We stress the necessity to divide the annotation process into several well planed phases. We present a system of automatic checking of the correctness of the annotation and describe several ways to measu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005